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Abstract 

Given a 2-SAT formula F consisting of n variables and [cnj random 
clauses, what is the largest number of clauses maxF satisfiable by a 
single assignment of the variables? We bound the answer away from 
the trivial bounds of |cn and cn. We prove that for c < 1, the 
expected number of clauses satisfiable is [cn\ — 9(l/n); for large c, it 
is {jC+Q{y/c))n; for c = 1+e, it is at least {l+e—0{s^))n and at most 
{1 + e — n{e'^ / lne))n; and in the "scaling window" c= l + 8(n^^/'^), 
it is cn — 0(1). In particular, just as the decision problem undergoes 
a phase transition, our optimization problem also undergoes a phase 
transition at the same critical value c ~ 1. 

Nearly all of our results are established without reference to the 
analogous propositions for decision 2-SAT, and as a byproduct we re- 
produce many of those results, including much of what is known about 
the 2-SAT scaling window. 

We consider "online" versions of max 2-SAT, and show that for one 
version, the obvious greedy algorithm is optimal. 

We can extend only our simplest max 2-SAT results to max k- 
SAT, but we conjecture a "max fc-SAT limiting function conjecture" 
analogous to the folklore satisfiability threshold conjecture, but open 
even for k = 2. Neither conjecture immediately implies the other, but 
it is natural to further conjecture a connection between them. 

Finally, for random max cut (the size of a maximum cut in a 
sparse random graph) we prove analogous results. 
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1 Introduction 



In this paper, we consider random instances of max 2-sat, max /c-sat, and 
MAX CUT. Just as random instances of the decision problem 2-SAT show a 
phase transition from almost-sure satisfiability to almost-sure unsatisfiability 
as the instance "density" increases above 1, so the maximization problem 
shows a transition at the same point, with the expected number of clauses 
not satisfied by an optimal solution quickly changing from B(l/n) to Q{n). 
Max cut experiences a similar phase transition: as a random graph's edge 
density crosses above 1/n, the number of edges not cut in an optimal cut 
suddenly changes from 0(1) to 0(n). 

Our methods are well established ones: the first-moment method for up- 
per bounds; algorithmic analysis including the differential-equation method 
for lower bounds; and some more sophisticated arguments for the analysis 
of the scaling window. The interest of the work lies in the simplicity of the 
methods, and in the results. The questions we ask seem very natural, and 
the answers obtained for max 2-sat and max cut are happily neat, and 
fairly comprehensive. 

A preliminary version of this paper appeared as |(XTHSn3] . 

1.1 Outlook 

Beyond our particular results for max 2-sat and max cut, we hope to spark 
further work on phase transitions in random instances of other optimiza- 
tion problems, in particular of max CSPs (constraint satisfaction problems). 
Random instances of optimization problems have been studied extensively 
— some that come to mind are the travelling salesman problem, minimum 
spanning tree, minimum assignment, minimum bisection, minimum color- 
ing, and maximum clique — but little has been said about phase transitions 
in such cases, and indeed many of the examples do not even have a natural 
parameter whose continuous variation could give rise to a phase transition. 

Many problems, including all CSPs, have natural decision and optimiza- 
tion versions: one can ask whether a graph is /s-colorable, or ask for the 
minimum number of colors it requires. We suggest that in a random set- 
ting, the optimization version is quite as interesting as the decision version. 
Furthermore, optimization problems may plausibly be easier to analyze than 
decision problems because the quantities of interest vary more smoothly. In 
fact, a recent triumph in the analysis of a decision problem, the characteriza- 
tion of the "scaling window" for 2-SAT, used as a smoothed quantity the size 
of the "spine" of a formula |BBC+ni| . A way to view our max 2-sat results 



3 



is that instead of taking the size of the spine as our "order parameter" , we 
take the size of a maximum satisfiable subformula. This seems comparably 
tractable (we reproduce the result of jBBC"'"OT] incompletely, but more eas- 
ily), and arguably more natural. Generally, when a decision problem has an 
optimization analog, the value of the optimum is both interesting in its own 
right, and, we suggest, an obvious candidate order parameter for studying 
the decision problem. 



1.2 Problem and motivations 

Let F be a fc-s AT formula with n variables Xi , . . . , X„ . An "assignment" of 
these variables consists of setting each Xi to either 1 (True) or (False); we 
may write an assignment as a vector X G {0, 1}"". fc-SAT is well understood. 
In particular, it is a canonical NP-hard problem to determine if a given 
formula F is satisfiable or not, except for k = 2 when this decision problem 
is solvable in essentially linear time. 

Random instances of fc-SAT have recently received wide attention. Let 
J-{n, m) denote the set of all formulas with n variables and m clauses, where 
each clause is proper (consisting of k distinct variables, each of which may be 
complemented or not), and clauses may be repeated. Let F £ T he chosen 
uniformly at random; this is equivalent to choosing m clauses uniformly at 
random, with replacement, from the 2'^(^) possible clauses. 

The model is generally parametrized as -F € J^in, cn) for various "den- 
sities" c, and the state of knowledge is summarized thus. The 2-SAT case is 
well understood: for c < 1, F is almost surely satisfiable (a.s. in the limit 
n ^ oo), and for c > 1, F is a.s. unsatisfiable |(Jl{,921 l(;oe961 IFdlV92j . 
Recently, the "scaling window" c = 1 it 0(n~^/^) has also been ana- 
lyzed |BBC+nH . For k -SAT, much less is known. For 3-SAT, for instance, it 



is known that for c < 3.42, F is a.s. satisfiable |KKLn2j and for c> 4.6, F 
is a.s. unsatisfiable |JSV00j . It is only conjectured, though, that for = 3 
(and for all k) the situation is similar to that for k = 2. 

Conjecture 1 (Satisfiability Threshold Conjecture) For each k there 
exists a threshold density Ck, such that for any positive e, for all c < Ck — £, 
a random formula F is a.s. satisfiable, and for all c > c^ + s, F is a.s. 
unsatisfiable. 

For large values of k, although the question of a threshold remains open, 
satisfiability and unsatisfiability density bounds are asymptotically equal, as 
shown by an analysis in |AM02j and refined in |AP03j . The closest result to 
the satisfiability conjecture is a theorem of Friedgut |Fri99j proving similar 
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thresholds, but leaving open the possibility that (for a given k), each n may 
have its own threshold, and that these may not converge to a limit. 

Theorem 2 (Friedgut) For each k there exists a threshold density func- 
tion Ck{n), such that for any positive e, as n ^ oo, for all c < c^ — E, 
a random formula F is a.s. satisfiable, and for all c > c^ + e, F is a.s. 
unsatisfiahle. 

Having briefly surveyed random fc-SAT, let us similarly consider max k- 
sat. For a given formula F, let F{X) be the number of clauses satisfied 
by X. The problem max 2-sat asks for maxF = max^ F{X), i.e., the 
maximum, over all assignments X , of the size (number of clauses) of a 
maximum satisfiable subformula of F. 

In the maximization setting, even 2-SAT is interesting, max 2-sat is 
NP-hard to solve exactly, and it is even NP-hard to approximate max F to 
within a factor of 21/22 |Has97j . On the other hand, a 3/4-approximation 
is trivial: a random assignment satisfies an expected 3/4ths of the clauses, 
and a derandomized algorithm is simple (our algorithm used to prove the 
lower bound for Theorem|l]can serve). The best known approximation ratio 
achievable in polynomial time is 0.940 |LLZ02j . For arbitrary 3-SAT formulas 
F, in polynomial time, maxF can be approximated to within a factor of 
7/8 |KZ97j . but no better (unless P=NP) |Has97| . 

Although both randomized and maximization versions of k-SAT are thus 
well studied, we are aware of no work on random max sat, nor other random 
MAX or min constraint satisfaction problems (cSPs). These problems seem 
very natural, and answers to even the simplest questions are not obvious at 
first blush: For a random 2-SAT formula F{n,cn) with c > 1, which is a.s. 
unsatisfiable, can we perhaps w.h.p. satisfy all but a single clause? 

These questions have elegant answers; we will show for example that 
random MAX 2-SAT has a phase structure analogous to the decision prob- 
lem's. And there is a hope that the maximization problems may help in 
understanding the decision problems. For 2-SAT this hope is borne out to 
a degree by our Theorem |H1 While our results for k > 2 are very lim- 
ited (see Theorem ITK]) . Conjectures [T2l and ITU link the open questions for 
the maximization and decision thresholds for random satisfiability. At this 
point we cannot guess the comparative difficulties of resolving the satisfia- 
bility threshold conjecture, its maximization analog, or the conjectured link 
between them. 

Our study of random MAX 2-SAT and random MAX CUT was also moti- 
vated by recent work on "avoiding a giant component" ; we will discuss this 
in section IHI 
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We consider several aspects of random max 2-sat and random max cut. 
We also extend the easiest results to arbitrary CSPs (constraint satisfaction 
problems). 

We will give a second motivation for considering problems of this sort 
when we take up MAX CUT, in section |S| 



2 Notation and model 

We write F{n, m) to denote a random 2-SAT formula on n variables, with m 
clauses. Typically we will fix a constant c and consider F{n, [cnj); where 
it does not matter we will often write cn in lieu of [cnJ and we often omit 
the notation [-J in other instances. For any formula F, define maxF to be 
the size of a largest satisfiable subformula of F. Our focus is the functional 
behavior of maxi^. 

Similarly, we write G{n, m) for a random graph on n vertices with m 
edges. For any graph G, let X describe a partition of the vertices, and 
let cut(G, X) be the number of edges having one vertex in each part of 
the partition. Define maxcut(G) = max^^ cut(G, X) , and fcutin,m) = 
E(maxcut(G(n, m))). 

We use standard asymptotic and "order" notation, so for example f{n) ~ 
g{n) means f{n)/g[n) — > 1 as n — > oo, and /(n) = o(n) means f{n)/n 0. 
We will also write f{n) < g{n) to indicate that / is less than or equal to g 
asymptotically — lim sup f (n) / g{n) ^1 — though it may be that /(n) > 
g{n) even for arbitrarily large values of n. Asymptotic results involving 
two variables, for example concerning 2-SAT formulae on n variables with 
cn clauses, with c large (or (1 + e)n clauses with e small) should always 
be interpreted as taking the limit in n second; thus "for any desired error 
bound there exists a cq, such that for all c > cq there exists an no, such 
that for all n > tt-q," etcetera. 



3 Summary of results 

We establish several properties of random MAX 2-SAT, random MAX fe-SAT, 
and random MAX CUT, focusing on 2-SAT. This section summarizes our main 
results and indicates the nature of the proofs; further results and proofs are 
given in subsequent sections. 

One of our goals is to establish the MAX 2-SAT results without depend- 
ing on those for decision 2-SAT, and in particular to work independently 
of [BBC"'"01 and reproduce its results. We enjoy some success in this; the 



6 



exceptions are our reliance on BBC"'"Ol] for the upper bound in Theorem |21 
(with an extraneous logarithmic factor arising in the translation), and a 
more acute form of the same problem in the scaling window, where we lack 
any corresponding bound for the A > 1 case of Theorem |BJ 

Figure show an "artist's rendition" of the our results for 2-SAT. For 
c < 1, we expect to satisfy nearly all clauses, while for c — > oo, we expect 
to satisfy only about 3/4ths of them. The aysmptotic behavior for c < 1 is 
understood; so is that for c large (with a log-factor gap in the bounds on 
the second term); and for c = lib0(n~^/^) (with only a one-sided bound on 
the second term). We now state these results more exactly; we prove them 
in the next section. 



1 



3/4 



f{n,cn)/{cn) 



n 



oo 




density c 



Figure 1: "Artist's rendition" of the behavior of f{n,cn)/{cn). 

For c < 1 a random formula F{n,cn) is satisfiable w.h.p., so we would 
expect maxF to be close to cn in this case; the following theorem shows 
this to be true. 

Theorem 3 For c = 1 — e, with any constant e > 0, [cnj — f{n, [cnj) = 
e{l/{e^n)). 

The proof comes from counting the expected number of the "bicycles" 
shown by \CHa2\ to be necessary components of an unsatisfiable formula. 

For any c, /(n, cn) ^ |cn, since a random assignment of the variables 
satisfies each clause with probability | . The next theorem shows that neither 
this bound nor the trivial upper bound cn is tight, although for large c, |cn 
is close to correct. 
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Theorem 4 For c large, (-y/c ^^^ — 0(l))n < f{n,cn) - |c?i < 
(^V31n(2)/8)n. 

The values of and y^3 ln(2)/8 are approximately 0.343859 and 

0.509833, respectively. The upper bound is proved by a simple first-moment 
argument, and the lower bound by analyzing an algorithm; both techniques 
are exactly those demonstrated in Spe94 Lecture 6] to analyze the Gale- 



Berlekamp switching game. 

Our next results relate to the low-density case, when c is above but close 
to the critical value 1. How does /(n, cn) depend on c = 1 -|- e for small e? 

Theorem 5 For any fixed e > 0, (1 -|- e — e^/3)n < f{n, (1 -|- e)n); also, 
there exist absolute constants ao and Eq , such that for any fixed < e < Eq, 
f{n, (1 + e)n) < (1 + e - iaoeV ln(l/e))n. 

That is, a constant fraction of the clauses must remain unsatisfied, but 
this fraction — e^/3 at most — is surprisingly small. The lower bound is 
proved by using the "differential equation method" (see for example |Wor95j ) 
to exactly analyze a version of the unit-clause heuristic. The upper bound's 
proof is a simple first-moment argument; however, for the probability that 
a sub-formula with density > 1 is satisfiable, it requires the exponentially 
small bound given by Bollobas et al. |RRC+Olj (see Theorem |H1 below) . It 



is likely that, by replacing our use of |BBC"'"OT] with structural properties 
of the kernel of a sparse random graph, the upper bound's e^/ln(l/e) can 
be replaced by to match the lower bound up to constants |JS02j . 



The major significance of |BBC"'"01j was to determine the "scaling win 



dow" for random 2-SAT. Without using their result, we prove an analogous 
result for MAX 2-SAT, and incidentally reproduce most parts of their 2-SAT 
result. 



Theorem 6 Letting c = 1 + e = 1 + An , we have 
[cn\ - f{n, [cn\) 



'0(A3) z/A>l; 
6(1) i/-l^A^l; 
e{\M~'') ^/A<-l. 



Furthermore, for A > 1, for some positive absolute constant, and any P > 0, 
Pr (([cnj - f{n, [cnj)) > const /3A^) ^ exp(-/3A^). 
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Also. 



'exp(-0(A3)) 



i/A> 1; 

^ A < 1; 
ifX<-l. 



Fi{F{n, cn) is satisfiable) = < 0(1) 



U-0(iAr^) 



In particular, in the scaling window c = 1 it An 



1/3 



a random formula 



is satisfiable with probability which is bounded away from and 1 (the 
exact bounds depending on A), and it can be made satisfiable by removing 
a constant-order number of clauses (the constant depending on A). 

In section 121 for MAX k-SAT, we derive analogous results only for c large, 
reflecting the general state of ignorance regarding the fc-SAT phase transi- 
tion. (For some results on scaling windows for fc-SAT see |Wil02j .l Still more 
generally. Theorem describes the high-density case for any MAX CSP. 
More interestingly, for random MAX A;-SAT (including A; = 2) we observe 
that maxF is concentrated about its expectation /(n, cn) (as previously re- 
marked in |BFU93j ') and that f{n, cn)/{cn) is monotone non-increasing in c. 
Were f{n,cn)/{cn) also monotone in n, an important property analogous 
to the satisfiability conjecture would follow; we present this as a conjecture 
for general max CSPs. 

In section 13 we consider online versions of MAX 2-SAT, for one of which 
we prove that a natural greedy algorithm is optimal. 

Results for the MAX CUT problem for sparse random graphs, which is 
closely analogous to random MAX 2-SAT, are presented in section |H1 



One of the most basic facts concerning MAX 2-SAT is that for constants 
c < 1, the expected number of clauses unsatisfied is o(l). This is refined by 
Theorem 13 which shows the number to be 0(l/(e^n)). We now prove the 
theorem. 

Theorem EJ Proof. We write the proof in the SAT equivalent of the 
^^G{n,p)" model, because the expressions for the probability of a clause's 
presence are cleaner in this model, but adaptation to the G{n, m) model is 
immediate. 

A k-bicycle (see Figure UTTj) is a sequence of clauses {u,wi} , {wi,W2} , 
. . . , {wk,v} where literals wi,W2,---,Wk are distinct as variables (none is 



4 Random MAX 2-SAT 



4.1 Sub-critical MAX 2-SAT 
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Figure 2: Sequence of clause-derived implications for a bicycle. Start the 
walk from u, proceed clockwise to Wi (which equals either u or u), continue 
right to Wj, and again go clockwise to terminate at v (which equals either 
Wj or Wj). 

the same as nor the complement of another) and u E {wi,Wi}, v G {wj,Wj} 
for some 1 ^ i,j ^ k. (Think of it as a "walk" in which the first and last 
variables are also both visited en route.) Because satisfying a clause {u,v} 
means that if u is true then v must be true, such a clause yields an implica- 
tion u ^ V (and a complementary implication 7; — > n); Figure ETD represents 
such a sequence of implications for a bicycle. Chvatal and Reed jCR92j ar- 
gue that if a formula is infeasible then it contains a bicycle. Thus if we 
delete an edge from every bicycle, the remaining subformula is satisfiable. 

The number of potential fe-bicycles, whether or not present in a given for- 
mula i^, is at most /c^(2n)'^. The probability that all k + \ clauses of a given 
bicycle are present in a random formula F is at most [[cn) / {2^ {^)]^^^ = 
[c/(2(n— 1))]'^"''^, so the expected number of /c-bicycles is < k"^ c^^^ / {2n) . If 
we delete one edge in every bicycle, we obtain a satisfiable formula. For any 
fixed c < 1, ^^c'^^^/(2n) = 0(l/n). Thus, the expected number of 

edges we need to delete is at most 0{l/n) and f{n, [cnj) ^ [cnj — 0{l/n). 

To obtain the lower bound we show that with probability at least 
0(l/(e^n)) the formula F is not satisfiable. This clearly implies an up- 
per bound /(n, [cn\) ^ [cn\ — 0(l/(e^n)). To this goal we employ the 
second moment method. 

For simplicity here, we will restrict ourselves to 3-bicycles, which will 
only establish "0£(l/n)", that is, something of order @{l/n) but with hid- 
den constants that may depend on e. The full proof is the same but using 
bicycles of lengths up to 1/e, not just length 3, and parallels the proof of 
TheoremlHl case A ^ — 1. (In fact, taking A = A(n) = en^^^ there establishes 
the current theorem completely.) 

Consider 4-tuples of clauses of the form {ui, ^2} , {u2,ui} , {ui, 113} , {-^3, ui}, 
where ui,U2,U3 are arbitrary variables. One observes that this sequence 
of clauses is a 3-bicycle, and, moreover, its presence in the random for- 
mula F implies non-satisfiability. We now show, using second moment 
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method, that the number B3 of such bicycles is at least one with prob- 
ability at least We have E[5|] = ^¥{X G F,X' G F), where 
the sum runs of over the pairs of 3-bicycles X, X' of the form above, and 
X £ F means all the clauses of X are present in F. We decompose the 
sum into three parts: the sum over pairs X,X' with X = X' , the sum 
over pairs that do not have common clauses and the rest. It is easy to see 
that the first sum is simply E[i?3] which is 0(l/n), by the argument for up- 
per bound. To analyze the second sum note that for each fixed pair X, X' 
with no common clauses, we have F{X,X' e F) = F{X G F)F{X' G F), 
when replacement of clauses is allowed. (When replacement is not allowed 
the reader can check that the difference between the left and the right- 
hand sides is very small, and the rest of the argument goes through). 
Then, this sum is smaller than T,x,X'^i^ ^ F)F{X' G F) = (^[Sg])^ 
where the sum now runs over all the pairs X, X' . For the third sum 
we have two cases. First case is pairs X, X' defined on the same set 
of variables. For example X = {ui,U2} ,{u2,ui} ,{ui,U3} ,{u3,ui} and 
X' = {ui,U2} ,{u2,ui} ,{u2,U3} ,{u3,U2} , share one clause {ui,U2} and 
are defined over the same set of variables. There are O(n^) choices for the 
variables ui,U2,U3 in these pairs. But since X ^ X' then there are alto- 
gether at least five clauses in X and X' together. For a given pair, the 
probability that all these clauses are present in F is 0(l/n^). Then the 
expected number of such pairs X,X' ^ F is 0(l/n^) = o(l/n). 

The second case is pairs X, X' defined over different set of variables. 
Since they share a clause then the pair is defined on exactly four vari- 
ables. But then there are at least six clauses in this pair. We obtain that 
the expected number of such pairs X, X' which belong to F is at most 
0{n^)0{l/n^) = 0(l/n2) = o(l/n). 

We conclude that E[5|] = £[^3] + o(l/n) = e(l/n) + o(l/n). We 
now use the bound P(Z ^ 1) ^ {E[Z] f /E[Z^] , which holds for any non- 
negative integer random variable Z . Applying this bound to we obtain 
P(53 ^ 1) ^ (E[S3])VE[5|] ^ e(l/n2)/(e(l/n) +o(l/n)) = e(l/n). This 
completes the proof. □ 

It is worth pointing out the following simple fact, upon which we will 
shortly improve. 

Remark 7 For c > 1, f{n,cn) > n{jc + j). 

Proof. It suffices to show that for any e > 0, for all n sufficiently large, 
f{n,cn) ^ (|c-|-| — e)n. Select the first (l — e)n clauses, and let X be a best 
assignment for it. By Theorem|31 X satisfies an expected (1 — e)n — o(l) of 
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these first clauses. Also, an expected 3/4tlis of the remaining (c — 1 + e)n 
clauses are satisfied, yielding the claim. □ 



4.2 High-density random MAX 2-SAT 

While it is well known that for c > 1, F(n, cn) is a.s. unsatisfiable, is it pos- 
sible that even for c large, almost all clauses are satisfiable? Theorem^rules 
this out by showing that a constant fraction of clauses must go unsatisfied; 
up to a constant, it also provides a matching lower bound. 

Theorem |1J Proof of the upper bound. The proof is by the first-moment 
method. If maxF > (1 — r)cn then there is a satisfying assignment of a 
subformula F' which omits rcn or fewer clauses, and where (taking F' to 
be maximal) all the omitted clauses are unsatisfied. Any fixed assignment 
satisfies each (random) clause of F' w.p. 3/4 and unsatisfies each omitted 
clause w.p. 1/4, so by linearity of expectations, the probability that there 
exists such an F' is 



P = P(3 satisfiable F') ^ 2" J]] [? j (IT'^H^)'" ■ 

k=0 ^ ^ 

For r < J the sum is dominated by the last term. From Stirling's formula 
n\ ~ y/2TTn (n/e)", 

(1) l^"^]:^ l/J2TTr(l-r)cn (r'Vl - r)-^^-^'))^". 

\rcn J 

Substituting ^ into the previous expression, 

P < l/V27rr(l-r)cn2" (r""(l - r)-(i"'")(3/4)i-''(l/4)'^)^". 
Substituting r = 1/4 — e, 

— In P<ln(2)/c- (8/3)^2 + 0(^3), 
cn 

so that for e > (3/8) In 2/c, as n ^ oo, P ^ 0. The conclusion follows. 

Proof of the lower bound. The proof is algorithmic. When variables 
Xi, . . . , Xk have been set, define the reduced formula Fi^ in which any clause 
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containing a True literal is removed and "scored" , and False literals are re- 
moved from the remaining clauses. (Clauses with variables remaining are 
permanently unsatisfied.) Define a potential function q{Fk) to be the num- 
ber of clauses already satisfied, plus 3/4 the number of 2-variable clauses 
( "2-clauses" ) , plus 1/2 the number of 1-variable clauses ("unit clauses"). 
Note that randomly assigning the remaining variables satisfies an expected 
total number of clauses precisely q{Fk) , so g is a lower bound on the number 
of clauses satisfiable. 

After variables Xi, . . . , X^-i have been set to define Fk-i , our algorithm 
sets Xfc in whichever of the two ways gives an with larger value q{Fk). 
(Ties may be broken arbitrarily.) In Fk-i, let the number of appearances 
of Xfc and iii unit clauses be denoted by Ai and Ai, and their number 
of appearances in 2-clauses by A2 and A2 ■ If Xk is set to True, then 

q{Fk) - q{Fk-i) = = ^(^1 - Ai) + ^(^2 - ^), 

and if Xk is set False, then q{Fk) — q{Fk-i) = — A^. Note that q{Fk) = 
q{Fk-i) + |Afc| is a lower bound on the number of satisfiable clauses, and 
(1{Fq) = |cn. 

With k-l variables already set, a.s. has (i±0(l/^))2^^^^;^- 

cn unit clauses, and ( "~^^^ )^-cn 2-clauses, on the remaining variables. (The 
reason for ^ it 0{\/ \/c) instead of ^ is that we set the previous variables 
in a biased manner.) Also, conditioned on the number of clauses, -Ffc-i is a 
uniformly random formula (each "slot" being equally likely to be filled by 
any of the remaining literals). For n large, Ai and Ai are approximated by 
independent Poisson random variables with parameter (^ it 0(l/y^))^^c, 
and A2 and A2 by Poissons with parameter "~^^^ c. By assumption, c is 
large, so each of these distributions is approximately Gaussian, and their 
sum Afc is also approximately Gaussian, with mean (by symmetry) and 
variance 

a2 = 2.(i)2.Var(^i) + 2.(i)2.Var(A2) 

f ,1 , ^,,k-l ln-k + l\ 

= c((-±oa/v^))— + 5^^). 

For Z ~ iV(0, 1), it is well known that E\Z\ = y/2/7r; thus E|Afc| = 
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il^)±0(l). Finally, 




-cn+ / E{\Ak\)dk 



k=0 

3 r 








0(1) n. 



□ 



We remark that in the preceding proof, X). was set True or False so as to 
maximize half the number of satisfied unit clauses plus a quarter the number 
of satisfied 2-clauses. This is reminiscent of the "policies" in [ASOnj . There, 
the goal was to satisfy as a dense a 3-SAT formula as possible; unit clauses 
always had to be satisfied, and variables were set so as to maximize a linear 
combination of the number of satisfied 2-clauses and 3-clauses. In |ASnflj . 
the linear combination which was optimal for the purpose changed during the 
course of the algorithm; the determination of the optimal combinations, and 
the proof of optimality, was a main result of the paper. In the present case, 
though, it is evident that the ratio 1:2 is optimal: for c large, the potential 
function q predicts the expected number of clauses satisfiable almost exactly. 
The difference can be ascribed to the fact that here c is "large" , and in lASOOj 
the corresponding parameter (the initial 3-clause density) was fixed (relevant 
values were in the range of 3.145 to 3.26). Were we to try to tune the max 
2-SAT algorithm above for small values of c, more complex methods like 
those of |AS00j would presumably be needed. 

4.3 Low-density random MAX 2-SAT 

For low-density formulas, with c = 1 + e and e > a small constant, the 
bounds of Theorem |2 are inapplicable. It is still true (from Remark |7I) that 
we expect to satisfy at least (1 + |e)n clauses, but it is not obvious whether 
the best answer is this, or close to the full number of clauses (1 + e)n, or 
something in between. In this section we prove Theorem El which shows that 
(1 + e)n — f{n, cn), the number of clauses we must dissatisfy, lies between 
e{e^n/ln{l/e)) and e{e^n). That is, a linear fraction of clauses must be 
rejected, but this fraction, at most Q{e^), is surprisingly small. We will 
employ the following theorem of Bollobas et al. |BBC"'"OT] on random 2- 



SAT. 
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Theorem 8 f |BBC"'"OT] . Corollary 1.5) There exist positive constants uq 
and Eq such that for any < e < Eq and sufficiently large n, ¥[F{n, {1 + 
e)n) is satisfiable] ^ exp{—aoe^n) . 

(Here, ao is the liminf of the constant implicit in in the theorem 
in jBBC"'"Ol] .') The exp(— 0(e'^?i)) probability of satisfiability in random 
2-SAT translates into an expected 0{e^n/ln{l/e)) unsatisfied clauses in ran- 
dom MAX 2-SAT. 

Theorem El Proof of the upper bound. The proof is by the first-moment 
method. Let c = 1 + e. Let F' range over subformulas of F which omit rcn 
or fewer clauses. Specifying r < 1/4, the conditions of Theorem |H1 apply, so 

rcn / X ^ 

(2) P = P(3 maximally satisfiable F')i^J2\ T ) (-)'''"e-"o(^-^)'"; 

fe=o ^ ^ ^ 

as r < 1/4, the sum is dominated by the last term. Using to approximate 

\crn/ ' 

1 „ 
— InP < — rlnr — (1 — r) In (1 — r) — ao(e — cr) /c — rln(4). 

cn 

First observe that as e ^ 0, for any r = o(e), this is 

= -rlnr(l + o(l)) - aoe^{l + o(l)) - rln(4). 
For any constant b < 1/3, if r = baoe'^ / ln{l / e) , this is 

= 3baoe\l + o(l)) - aoe^{l + o(l)) < 0. 

That is, it is unlikely that asymptotically fewer than (l/3)Qoe^/ ln(l/e) 
clauses can go unsatisfied. 

Proof of the lower bound. The proof is algorithmic, and of the sort familiar 
from |AS00j and previous works. It analyzes a version of the "unit-clause" 
heuristic. Initially, "seed" the algorithm by randomly deleting a variable 
from each of, say, n^^^^ random 2-clauses to convert them to unit clauses. 
While F has any unit clauses, select one at random and set its variable 
to satisfy the clause. Continue until no unit clauses remain. The analysis 
consists of counting the clauses unsatisfied in these steps, and justifying the 
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assertion that when there are no more unit clauses, o(l) further clauses need 
be unsatisfied. 

When k variables have been set, let the number of 2-clauses be denoted 
7712 (fc), the number of unit clauses mi{k), and the number of unset variables 
m{k) = n — k. In one step, the changes in these quantities are Am = 
— 1, E(Am2) = —^m2, and E(Ami) = —1 — + ^^7^2 (assuming that 

mi > before the step). Over a large number of steps, the net changes 
will be e. equal to the expectations. Renormalizing with p = m/n, 

Pi = mi/n, and p2 = m2/n, the differential equation method (see for 
example |AS00l IWorOSj ^ asserts that (pi,p2) a-s. a.e. obey the differential 
equations 

dp2/dp=^ dp,/dp=l + ^-P^. 

P P P 

With boundary conditions that for p = 1 (i.e., initially), P2 = c and pi = 0, 
the unique solution is 

P2 = cp^ pi = cp- cp^ + plnp. 

This results in pi = at two times: initially, when p = 1, and also for 
p = p* satisfying 

(3) c = ln(p'^)/(/-l). 

While p > p* , the only clauses ever unsatisfied are unit clauses which 
contain the negation of the variable being set, and the expected number of 
such rejected clauses per step is ^n^i = Integrating over the period p* 
to 1, 



|^^=U. (c-cp + lnp)dp 



cp - cp /2 + plnp - p) 



P* 



which, substituting for c from ^ 

(4) = + 

So from p = 1 to p = p* , the number of clauses dissatisfied by the 
algorithm is a.s. a.e. n times expression @. After this time, the remain- 
ing (uniformly random) 2-SAT formula has density P2{p*) / P* = cp* = 
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ln(p*)p*/(p* — 1) < 1 since ln(p*) < p* — 1 and p* < 1, and thus (by The- 
orem |7| contributes o(l) to the expected number of unsatisfied clauses. In 
short, the algorithm a.s. fails to satisfy a.e. {^{p* — 1) — + l)ln p*)n 
clauses. For p* (asymptotically) close to 1, the number of dissatisfied clauses 
is ~ n(l — p*)^/24. In particular, with e > asymptotically small and 
c = l + e, — 2e, and the number of dissatisfied clauses is ~ 

□ 

Two remarks. First, in addition to the asymptote, the proof gives a pre- 
cise parametric relationship (as functions of p*) between the clause density 
c (given by (jSJ) and the rejected-clause density (given by @). Solving nu- 
merically, for c = 1.5 we find rejected-clause density ~ 0.0183275, and for 
c = 2 — where naively the rejected-clause density would be |c = 0.5 — we 
achieve rejected-clause density « 0.0809517. 

Second, with the solution in hand, the asymptotic behavior is easy to 
see without the need for differential equations. This alternate proof is not 
fully rigorous, but is more intuitive and more robust; it is the basis of the 
analysis within the scaling window (see Theorem El). 

Theorem [S] Alternate proof of lower bound. Consider what happens when 
m = (1 — 6)n variables remain unset. The number of 2-clauses is a.s. m2 — 
(1 — (5)^(1 + e)n ~ (1 + e — 26)n. The expected increase in the number 
of unit clauses is then E(Ami) = —1 — mi/m + m2/m ^ —1 + m2/m 
(and the neglected mi/m is not only conservative, but will also prove to be 
insignificantly small). Thus, E(Ami) ^ -l + [{l+e-25)n]/[{l-5)n\ ~ e-6. 
At (5 = 0, the number of unit clauses increases by e per step, this increase 
linearly falls to per step hy 5 = e, and further to — e by 5 = 2e: the 
expected number of unit clauses is bounded by an inverted parabola, with 
base 2en and height ^e^n. At each step about l/(2?i)th of the unit clauses 
get dissatisfied. The area under the parabola, times this l/(2n) factor, is 
I • base • height • l/(2n) = ie^n. □ 

5 The MAX 2- SAT scaling window 

For random MAX 2-SAT, we have seen that for fixed c < 1, [cnj — 
/(n, \cn\) = 0(l/n), and for c > 1, cn — f{n,cn) = Q(n). That is, random 
MAX 2-SAT experiences a phase transition around c = 1. It is natural to ask 
about the scaling window around the critical threshold: What is the interval 
around c = 1 within which [cn\ — f{n, [cn\) = 0(1)? Theorem IHl shows 
that the scaling window is c = 1 it ©(n"^/^). The corresponding question 
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for random 2-SAT is the range in which ¥{F{n, [cn\) is satisfiable) = 0(1)- 
This was shown by |BBC"'"OT to be c = 1 it @{n~^/^) with their result 



reproduced as Theorem Inhere. 

Theorem 9 (Bollobas et al, |BRC+m| ^ Let F{n, cn) be a random 2- 
SAT formula, with c = 1+A„n~-^/^. There are absolute constants < Eq < 1, 
< Ao < oo, such that the probability F is satisfiable is: 1 — G(l/|A„p), 
when —Eon^/^ ^ A.„ ^ — Aq; ©(1); when — Aq ^ A^ ^ Aq; and e~^^'^"\ when 
Ao^A„^eor^^/3. 

That the two scahng windows are the same is no coincidence, and in fact 
Theorem IS] reestabhshes much of Theorem El independently. 

Theorem Proof. Note that, provided we prove the bounds for the cases 
A ^ — 1 and A ^ 1, the bound for the case |A| < 1 follows immediately, since 
we obtain that the probability of satisfiability is at least exp(— 0(A^)) ^ 
exp(— 0(1)) and at most 1 — 0(1/|A|^) ^ 1 — 0(1), where in both cases 
|A| < 1 was used. The more interesting cases |A| ^ 1 are considered in two 
subsections below. 



5.1 Case c = 1 + An-^/^ A ^ -1 

For convenience we write c = 1 — An~^/^ and A ^ 1. The proof for this 
case is very similar to that of Theorem El and uses the notion of bicycles. 
(As in the earlier case, we work in the equivalent of the G{n,p) model for 
notational convenience, with the understanding that the proof works equally 
well in the G{n,m) model.) As before, the number of clauses that must be 
dissatisfied is bounded by the number of bicycles. The expected number 
of fc-bicycles is at most /i;^c'^+^/(2n) = k'^{l — An~^/^)^"'"^/(2n) . Using the 
formula X]fc>i ^^P*^ = ^^^"^^3 which for p ~ 1 is ~ 2/(1 — p)^ , we have 

(5) Yl ^^(l-An-i/3)fe+i/^2n) ~2/Al 

l55fc<oo 

Therefore [cnj — f{n, [cn\) = 0(1/A^). Using Markov's inequality we also 
obtain that the probability that the formula is unsatisfiable is at most the 
expected number of bicycles, that is, at most 0(1/A^). 

We now obtain a matching lower bound. Consider only "bad" bicycles, 
in which u = Wi, v = wj, and i < j. Note that no bad bicycle is completely 
satisfiable, since the first "wheel" u ^ ■ ■ ■ ^ Wi = u requires u = False 
and thus Wi = True; whereupon the path (technically called the "top tube" 
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of a bicycle) Wi — > • • • Wj implies Wj = True; and the second wheel 
Wj ^ ■ ■ ■ ^ V = Wj provides a contradiction. Note that about l/8th of the 
potential bicycles are bad. 

Let Bk denote the number of the bad fc-bicycles. Since 

(6) E(#unsatisfiable clauses) ^ Pr(F unsatisfiable) 
it suffices to prove that this is 

(7) = n(l/A3); 

we will show this for K = (1/A)n3. Repeating the argument for we 
obtain that 



E[E ^'^l ^ (2/(8e))/A^ 



the l/(8e) coming from the series' truncation at K and the use of only bad 
bicycles. To obtain Q it suffices prove that 

(8) E[(5^S,)2] = (l + 0(l))-E[j;i?,], 

k^K k^K 

for then 

We wiU prove © with 0{1/X^) filling in for 0(1) (recall that A ^ 1). 
Consider pairs of A;, A;' -bicycles X,X' with k,k' ^ K. It suffices to show 
that for every X , 

(9) ^ ^ ^) = 0(1/A3), 

because then 

E[(Ei?fc)2]= 5^P(X,X'CF) 

A: 

= Pr(X C F) [1 + E Pr(X' C F | X C F)] 
^E[j;i?fe](l + 0(1/A3)). 
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Establishing (jU)) is the nub of the proof. First, observe that for any 
bicycle X' sharing no literals with X, Pr(X' C F | X C F) ^ Pr(X' C F), 
and so such bicycles X' contribute ^ '^^k-^k = 0(1/A'^) to the sum. 

Given a bicycle X' = {u, wi} , {wi, W2} , ■ ■ ■ , {wfc, f } , a sequence of lit- 
erals Wi, Wi^i, . . . ,Wj from X' is defined to be a type I excursion if literals 
Wi,Wj belong to X but literals tfj+i, . . . ,Wj^i do not. (If j = i + 1, a se- 
quence Wi, Wi^i is a type I excursion if the corresponding clause {iDi,Wi^i) £ 
X' does not belong to X.) A sequence of literals u' ,wi, . . . ,Wj in X' 
is defined to be a type II excursion if the literal Wj belongs to X, but 
u,wi, . . . , Wj^i do not. Similarly, a sequence wjjWj+i, . . . ,v' in X' is de- 
fined to be a type III excursion. 

Bicycles X' which are neither equal to X nor disjoint from X must 
have at least one excursion (and at most one each of excursions of type II 
and III). It suffices to establish @ for such bicycles X' . We will just show 
that the expected number of bicycles X' with one type II excursion, no 
type III excursion, and any number r ^ of type I excursions, is 0(1/A'^); 
the other three cases (classified by the number of type II and III excursions) 
follow similarly. 

Since a collection of excursions uniquely defines X' , it is enough to 
count such collections. Let the lengths of the type I excursions be 
mi, 1712, ■ ■ ■ , rrir ^ 2 and that of the type II excursion mjj, where the length 
is defined by the number of literals. 

For each type I excursion there are two endpoints (literals) which belong 
to X. Since the size of X is ^ K = (1/A)n^, there are ^ K^'' = {l/X^'')n^ 
choices for all the end points. The ith type I excursion contains — 2 
literals not from X , so there are at most (2n)™'~^ ways of selecting them. 
The excursion contains nii — 1 clauses, all not from X , so the probability 
they are all present in F is (1 — Xn~^^'^)^'~^ / {2n)'^'~^ . 

Similarly, for the type II excursion, there are at most K choices for the 
endpoint literal Wj-i, which belongs to X, and at most {2n)'^^'~'^ choices 
for other literals u' ,wi,. . . ,Wj-2- The excursion contains rrifj — 1 clauses, 
all not from X , so the probability that they are all present in F is (1 — 
An-V3)m//-i/(2n)"""-i. 

Combining, we obtain that the expected number of bicycles X' con- 
taining exactly r type I excursions, one type II excursion, and no type III 
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excursions is 

2r+2 



#(r,0,l)^ (l/A2'-+2)n^(2n)'""-2+S,-.-2'^ 

m//,mi,...,m,. 5^2 

(1 — Xn^'^/^)'^ll^'^+T,i'mi-r 

1 

= i -r- V (1- An-l/3Nm„+E,m.-r-l_ 

9r+l\2r+2„^ ^ ^ ^ 

^ A ' n 3 m/7,mi,...,m^^2 



Note that 



(1 - An-i/=^)'""+S 

mij ,m,i,...,mr^2 

J2 (1 - An-i/3)-//+E 



rtii—r—l 



mjj,mi,...,mr^l 




Applying this to the equahty above we obtain 

#(r, 0, 1) ^ ^ , , ^ „ , „ , and 

g/''-'°'^' SA3(l-2A3) =°"/^'>- 

With similar calculations for ^^(r, •, •) this establishes Q, and completes 
the proof of the case A ^ — 1 of Theorem |HJ □ 

5.2 Case c = 1 + An-i/^ A ^ 1 

The proof of this part resembles the alternate proof of Theorem |SJ There we 
showed that mi{t) a.s. a.e. followed a parabolic trajectory. Both there and 
here, at time t = en, the expectation given by the parabola is ^e^n, and 
the typical deviations (the standard deviation) from summing en binomial 
r.v.s with distributions near to B{n, 1/n) is about ^/en. 

In the previous case, with e = 0(1), the deviations were a.s. tiny com- 
pared with the expectation, but here, with e = An~^/^, the standard de- 
viation of \'^n^^^ is of the same order (in terms of n) as the expectation 
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of ^A^n^/^: the trajectory is not predictable in an a.s. a.e. sense. Figure |21 
shows two typical samples (with A = 2 and n = 10, 000) against the nominal 
parabolic trajectory. The analysis is thus more involved. 




Figure 3: Nominal parabolic trajectory of mi{t) vs t, and two random 
samples for density 1 + Xn~^/^ (A = 2, n = 10,000). With density 1 + 
0(n~^/^), the random fluctuations are of the same order as the nominal 
values. 

As before, we analyze the unit-clause resolution algorithm in which if 
there are any unit clauses (if mi(t) > 0) we choose one at random and set 
its literal True, and otherwise we choose a random literal (from the variables 
not already set) and set it True. 

Our analysis proceeds in three phases. Phase I proceeds until time 
T = 2en, and we show that in this period, there is an exponentially small 
chance that mi is ever much larger than its expectation. In Phase II, we 
continue unit-clause resolution until mi{t) = 0; we show that this happens 
quickly. These will give the required bounds on the integrated number of 
unit clauses, and in turn unsatisfied clauses, produced by unit-clause reso- 
lution. In Phase III we have a formula of density ^ 1 — en, and we simply 
apply the (non-algorithmic) proof of the Theorem's case A ^ — 1, proved in 
Section 15.21 

5.2.1 Useful facts 

We first establish a simple relation, useful for Phase I and essential for 
Phase II. The number of 2-clauses remaining (both of whose variables re- 
main) at time dn is m2{Sn) ~ B{n{l + e),(l — S)'^)- Thus for all times 



22 



t ^ ^^{1 + e) (much longer than the times Q{en) in which we are inter- 
ested) , 



(10) Pr max m2 {6n) - n(l + e)(l - 6f ^ n^/^ I 1 ^ exp(-e(n^/^)) 



We prove ()lUp using the Chernoff bound that for a sum X of independent 
0-1 Bernoulh random variables with parameters pi, . . . ,pn and expectation 

En 

(11) Pr(X > fi + A)<exp {-A^/{2^ + 2A/3)) . 

(See for example j.lt^HOni Theorems 2.1 and 2.8].) To establish ^ we take 
(1 -|- e)n i.i.d. Bernoullis with pj = (1 — 5)^. For any fixed 5 in H10|) this 
immediately gives probability exp(— G(n^/^/n)) , and the sum over the G(?i) 
possible values of 5 can be subsumed into the exponential. 
In the main we will therefore assume that 

(12) m2{6n) ^n{l + e){l-6f + n^'^, 

and deal with the failure case only at the end. 

We will also need two simple distributional inequalities. First, a 
Bernoulli random variable is stochastically dominated by a similar Poisson 
random variable, 

Be(p) ^ Po(-ln(l -p)), 

as they give equal probability to 0, and the Bernoulli's remaining probabil- 
ity is entirely on 1 whereas the Poisson's is on 1 and larger values. (Here 
we have written Be(p) and Po(— ln(l — p)) where we really mean random 
variables with those distributions; we shall continue this practice where con- 
venient.) Summing n independent copies of such random variables shows 
that a binomial is dominated by a similar Poisson, 

B{n,p) :< Po(— nln(l — p)). 

In particular, for any a,b = 0(1), 

(13) B{an, b/n) < Po(-anln(l - b/n)) = Vo{ab + 0{l/n))). 

We also recall that the exponential moments of a Poisson random vari- 
able are 

(14) EzP"^'^) =exp((z- l)d). 

We now analyze the unit-clause algorithm in Phases I and II. 
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5.2.2 Phase I 

During Phase I, assuming at times t = 6n, 

m2{t) = n(l + e)(l - 6f + ©(n^/^) ^ n(l + l.Ole - 26), 

using e ^ n^^/^. Meanwhile the number of unset variables is m{t) = n(l — 
5), so in particular, 

(15) m2(t)/m(t) ^ 1 + 1.05e. 

With the random variables below all independent, the unit-clause algo- 
rithm gives 

mi{t) — mi(t — 1) 

= -1 + l(mi(t - 1) = 0) - B{mi{t - 1), l/(2m(t - 1))) 

+S(m2(t-l),l/(m(t-l))) 
^ -1 + l(mi(i - 1) = 0) + B{m2{t - 1), l/(m(t - 1))) 
^ -1 + l{mi{t - 1) = 0) + Po(l + Lie), 

where the last inequality uses ((Ti^ . ((T3|) . and O.le » 1/n. 

It is easy to see that, starting from mi(0) = m'i{0), if X{t) ^ 
y(t) for all t, if 

mi(t) - mi(t - 1) = l(mi(t - 1) = 0) + X(t - 1) and 
m[{t) - m[{t - 1) = l(m;(t - 1) = 0) + y(t - 1), 

then for all t, 

mi{t) ^ m'i{t). 

(An easy proof is inductive. The l(-) term may contribute to rrii and not 
to m'^ if mi < mf^, but in that case, the inequality still holds.) In a similar 
setup but with X{t) < Y{t), coupling shows that mi{t) ^ m\{t). 
Thus mi(t) < m'i{t) where m'i{0) = and 

m[{t) - m[{t - 1) = -1 + l{m[{t - 1) = 0) + Po(l + Lie). 

Now, let U{t) be a random walk with U{0) = and independent increments 

(16) C/(t)-;7(t-l) = -l + Po(l + l.le), 
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and let V{t) count the "record minima" of U, so V{0) = and V{t) = 
V{t - 1) except that if U(t) < min^<t[/(r), then V(t) = V{t - I) + 1. 
Observe that 

(17) mi{t) ^ m[{t) = U{t) + V{t - 1). 

{V{t) precisely takes care of the l(-) terms.) 

At this point, we have reduced the behavior of the number of unit clauses 
mi{t) to properties of a simple Poisson-incremented random walk. 

Renewal process V 

We first dispense with V , by showing that 

(18) V{oo) = supy(t) ^ G{2e), 

where G{p) indicates a geometric random variable with parameter p. Start- 
ing from any time to ^it which f/(to) is a record minimum (at which 
V{to) = V{to - 1) + 1), define U'{t) = U{to + r) - U{to) + 1. Observe 
that U'{0) = 1, and the first time r for which U{t) = gives the next time 
to + T for which V{to + T) = 1. Thus the number of "restarts" of the process 
U' is V{oo). 

U' may be viewed as a Galton- Watson branching process observed each 
time an individual gives birth (adding Po(-) offspring to the population) 
and itself dies (adding —1). As a super-critical Galton- Watson branching 
process, U' has a positive probability of non-extinction, and thus the number 
of restarts (following extinctions) is geometrically distributed. 

Quantitatively, the extinction probability of a Galton- Watson process 
with X offspring (the probability the process never hits 0) is well known to 
be the unique root p £ [0, 1) of 

(19) p = E(p^). 

(See for example |Dur96| pp. 247-248] .) Also, for any p such that p > E(p^) , 
the probability of non-extinction exceeds 1 — p. In this case, recalling (|16l) 
and dJ, we seek p such that 

p > E(p^) = exp((p - 1)(1 + Lie)) 

or equivalently, with q = 1 — p, 

In(l-g) > -g(l + l.le). 
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Taking a Taylor expansion around q = and cancelling like terms, it suffices 
to ensure that + ^q"^ + • • • < l.le, and q = 2e suffices (for all e < 0.37, 
let alone the e = Q(n~^^^) of interest). 

Thus U' has non-extinction probability at least 2e, verifying ()18() . 

Random walk U 

We now analyze the random walk U to show that for any < e ^ 0.02 and 
< a ^ 0.06 (our principal realm of interest will be e, a = e(n-^/3)), for 
any time t, 



(20) Pr max U{t) ^ EU{t) + at exp(-ta72.1). 

Observe that U{t) is a submartingale, and for any P > (by convexity 
of exp(Pu)), exp{pu{t)) is a non-negative submartingale. It follows from 
Doob's submartingale inequality (see |Dur96j ) that 

Pr ( max U{t) ^ EU{t) + at 

= Pr ( max exp (/3t/(r)) ^ exp {l3{EU{t) + at)) 

.2.^ < E(exp(/??7(t))) 

^ ' ^ exp{P{EU{t) + at))' 

Trivially, 

(22) EU{t) = -t + {l + l.le)t = l.let, 
and, by (fTl|l. 

E (exp(/3C/(t))) = exp(-/3t + /3Po((l + l.le)t)) 

= exp{-(3t) exp((e^ -!)(! + l.le)t), 

so (Ell) is 

(23) exp(-t[/3 - (1 + l.le)(e^ - 1) + /?(l.le + a)]). 

We are free to choose /? > as we like, so to minimize ()23() we maximize the 
innermost quantity. Setting its derivative equal to yields 1 — (1 + l.le)e^ + 
l.le + a = OT P = ln(l + a/(l + l.le)), but we will simply take P = a. 
Then (eschewing asymptotes in favor of absolute bounds), for e < 0.02 and 
a < 0.06 (let alone the regime e,a = B(?i~^/^) of interest), is 

^ exp {-ta^ /2.1) , 

proving (|2()j) . 
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Parameter substitution 

In the case of interest, 1 ^ A <C n^/^, e = Xn^^^'^ <C 1, and t = 2en = 
2An2/3. Here, 

Emi(t) < EU{t) + EV{oo) ^ l.let + — ^ 2.2A2n^/^ + n^/^ 3.2X'^n^^^. 
Substituting a = ol j^ft into (PU)) (wliich is then valid up to ol = n^/^ = 

(24) Pr(max f/(t) ^ E;7(t) + Q'\/2An^/3) ^ exp(-a'V2.1), 

SO the tails of U (t) fall off exponentially with a "half-life" smaller than the 
bound on the mean (as A ^ 1 implies \/2A < 2.2A^). Vipo) has an expec- 
tation which is at most comparable, and (as a geometric random variable) 
again falls off exponentially with half-life comparable to its mean. 
It follows that 



E^mi(T) ^ (2en)E([7(2en) + 1/(c3o)) ^ ^A\^ 

T = l 

and, for a' ^ n^/^, 

(25) Pr max mxij) ^ a 3.2A2 n^'^ I = exp(— f](a')) and 

Xj^en J 

(26) Pr ^^mi(r) ^ a'6.4A3n^ = exp(-Jl(a'))- 

The probability of a deviation with q' > n^/^ is exp(— f](n^/^)) , and will 
be dealt with as a "failure probability" at the end. 

5.2.3 Phase II 

The analysis of this phase largely parallels the previous one. 

Assuming ((T^ . at times t = 6n, m2(t)/m{t) is roughly (1 -|- e)(l — 5), 
and in particular, since in Phase II by definition 6 ^ 2e, 

(27) m2(t)/?n(t) ^ 1 - 0.95e. 
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Since Phase II ends as soon as mi{t) = 0, there is no +!(•) term to 
worry about, so assuming ()27|) . 

mi(t) — mi(t — 1) 

= -1 - B{mi{t - 1), l/(2m(t - 1))) + B{m2{t - 1), l/(m(t - 1))) 
^ -l + Po(l-0.9e). 

By the same argument as for Phase I, then, 

mi(2en + t) ^ mi(2en) + W{t) 

where W{t) is a random walk with VF(0) = and independent increments 
-1 + Po(l -0.9e). 

Note that W{en) = VF(An2/3) has mean and standard deviation both 
G(n^/^), so for multiples of this time, W is exponentially sure to achieve at 
least half its (negative) expectation; we now quantify this. At time aen, 

Pr{W{aen) > -\a£^n) 

= Pr (Po(aen(l - .9e)) > E(Po(-)) + ^Aae^n) 

^ f (0.4a£^n)^ \ 

^ V aen(l - 0.9e) + OAae'^n J 

since the Chernoff bound (jllj) applies as well to the Poisson. Substituting 
e = An~^/^, the denominator's first term, of order 0(an^/'^), dominates the 
second, of order G(an^/^), giving 

(28) Pr{W{aen) > -\aE^n) ^ exp(-0.42aA^). 

Then, conditionally on Phase I ending at mi{2en) = aiA^n^/^ (see ()25() ). 
for any a > 2ai, (|28j) implies that Phase II ends by time 2en + aen, with 
probability exponential in a. 

Furthermore, over Phase II, mi{t) is unlikely ever to increase much over 
its initial value. An argument along the lines used in the context of equa- 
tion ()19() could be constructed to show that max^^o W{t) is exponentially 
sure to be quite small, but as there are some technical complications, we 
take a simple, wasteful approach. Observe that 

W{t) ^ X{t) 

where -'^(O) = VF(0) = and X{t) has independent increments — l-|-Po(l-|- 
e). This wild over-estimation is useful because X (unlike W) is a sub- 
martingale, to which we apply Doob's inequality. Just as in sections 15.2.21 
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and 15.2.^ over a time interval aen, X is exponentially unlikely ever to ex- 
ceed a multiple of its final expectation, EX(aen) = ae^n = aA^n-^/^, and 
so W and in turn mi are at least as unlikely to rise more than this amount 
above their initial values. 

So, conditionally on Phase II starting at mi = oi A^n^/^ , Phase II finishes 
within additional time oA^n^/^ with exponentially high probability for all 
a ^ 2ai, and within that additional time, mi is exponentially unlikely (in 
(3) to exceed {ai+ I3a))^n^/^ . It follows that if Phase II ends at time 2en+t, 

(29) Pr "^1 (2^^ + r)> const ^ exp(-/3') ■ 
5.2.4 Phases I, II and III 

We have argued that over Phases I and II the number of unit clauses mi{t) 
is exponentially unlikely ever to exceed a multiple of e^n = A^n^/^, and 
that Phase II is exponentially unlikely to end after a multiple of time en = 
An2/3, to prove, in (|26j) and (|29j) . that the summed number of unit clauses 
Ml = X^^mi(T) (summed over times r from to the end of phase II), is 
exponentially unlikely to exceed a multiple of A^n: 

Pr(Afi ^ const /3A^n) ^ exp(-/3). 

By definition of the unit-clause algorithm, at each stage the literals form- 
ing the unit clauses are drawn independently at random with replacement 
from among the literals not yet set, and so the number of unit clauses dis- 
satisfied at each step t is 

(30) B{mi{t),l/{2{n-t)) 

(where mi(t) is itself a random variable). With probability 1 — 
exp(— 0(n^/^)) these phases end long before time t = n/3, so H30() is 
^ Po(0.8mi(t)/n), and by independence of the random variables in ^^0^ 
(each conditioned on mi[t)) for different times t, the total number of unit 
clauses dissatisfied in phases I and II is dominated by Po(0.8Mi/?i). 

Since EMi = O(A^n), the Poisson's expectation is O(A^), and the num- 
ber of X of unit clauses unsatisfied over these phases also has EX = O(A^); 
this confirms (for Phases I and II) one assertion of Theorem . Fixing 
P = 1, there is at least constant probability that Mi ^ const A^n and so the 
probability that no unit clause is dissatisfied is Pr{X = 0) ^ exp(— 0(A^)), 
a second assertion of the theorem. Since both Mi and Po(Mi/n) have ex- 
ponential tails, so does X — Pr(X ^ /3 const A^) ^ exp(— /?) — a third 
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assertion of Theorem 1^1 We now argue that Phase III leaves all these prop- 
erties intact. 

By construction, at the conclusion of Phases I and II the remaining 
formula is uniformly random, still on n(l — o(l)) variables, but now with 
density ^ 1 — e ^ 1 — n~^/^ . For Phase III we simply argue that, by the 
previously proved case A ^ — 1 of this Theorem, such a formula can be 
satisfied but for ^ const /3 clauses, with probability ^ 1 — exp(— /3). This 
concludes the proof of the case A > 1 of Theorem |HJ □ 

5.2.5 Remarks 

A corresponding lower hound, on the number of clauses that must be vio- 
lated, cannot be found by the same techniques, since there is no guarantee 
that the unit-clause algorithm is doing the best possible. One alternative is 
to analyze the pure-literal rule, which is guaranteed to make no "mistakes" 
as long as it runs, then use other methods to analyze the remaining "core" 
formula; we understand that this analysis has been done successfully (and 
independently) by Kim |Kimj . Another approach might be to extend the 
"bicycles" analysis of Theorem |21 (or the A ^ — 1 case of Theorem ^ to the 
case e > (particularly, e = \n~^/^ and A > 1), but this seems not to be 
easy. 

We remark that, no matter the particular approach pursued, verifying 
that the number of clauses that must be dissatisfied is r2(A'^) seems to lead 
back, in intuition and in proof techniques, to the fact that in a Gn,p random 
graph with average degree np = 1-|-An~^/^, there is likely to be a giant com- 
ponent whose "kernel" is a random cubic graph on 0(A^) vertices |.TLR,fln| 
p. 123]. 

6 Random MAX k-SAT and MAX CSP 

In this section we present some general facts and conjectures about MAX 
k-SAT and MAX CSP, and generalize the 2-SAT high-density results. 

6.1 Concentration and limits 

It is known that random fc-SAT has a sharp threshold: that is, there exists 
a threshold function c(n) such that for any e > 0, as ^ oo, a random 
formula on n variables with (c(n) — e)n clauses is a.s. satisfiable, while one 
with (c(n) -|- e)n clauses is a.s. unsatisfiable |FVi99] . To prove an analogous 
result for random MAX k-SAT is much easier; this was first done by (BFU93j . 
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We will employ a "bounded difference" inequality; specifically, a generaliza- 
tion of Azuma's inequality in a form due to McDiarmid |McD89j (see also 
Bollobas jBolSSQ . 

Theorem 10 (Azuma) Let Xi, . . . ,X„ he independent random variables, 
with Xfc taking values in a set for each k. Suppose that the (measurable) 
function / : J^ylfc — > M satisfies |/(x) — f{x')\ ^ Ck whenever the vectors 
X and x' differ only in the k 'th coordinate. Let Y be the random variable 
f{Xi,...,Xn). Then for any X> 0, P[|y-Ey| ^ A] < 2 exp(-2AV ^ 4) . 

Let Fi:{n, m) be a random k-SAT formula on n variables with m clauses, 
and let fk{n,m) = E(maxi<fc); we may omit the subscripts k. 

Theorem 11 (jEFEnHl) For all k, n, c, and X, P(| max Ffc(n, cn) - 
fk{n,cn)\ > A) < 2exp(-2AV(cn)). 

Proof. Let Xi represent the ith clause in F. Replacing Xi with an 
arbitrary clause cannot change maxF by more than 1. The result follows 
from Azuma's inequality. □ 

The theorem's statement that for any c and large n, F{n,cn)/{cn) has 
some almost-sure almost-exact value, is reminiscent of Priedgut's theorem 
(Theorem 12) that (loosely interpreted) says that for large n and any c away 
from the threshold, Pr{F{n, cn) is satisfiable) is almost exactly either or 1. 
In our case, the target value f{n,cn)/{cn) is unknown and it is unknown 
whether it has a limit in n, and in Priedgut's case, again, it is unknown for 
which cs the probability is near and for which it is near 1, and whether 
the threshold value of c (and the distribution function) has a limit in n. To 
conjecture that f{n, cn)/[cn) tends to a limit in n is in this sense analogous 
to the "satisfiability threshold conjecture" . 

Conjecture 12 (max sat limiting function conjecture) For every k, for 
every constant c > 0, as n ^ oo, fk{n,cn)/n converges to a limit. 

The conjecture may equally well be extended to arbitrary CSPs, yet is open 
even for MAX 2-SAT. 

If fk{n,cn)/{cn) were monotone in n, the conjecture's truth would fol- 
low. Of course we do not know this, but can prove monotonicity in c: that 
as the number of clauses increases, the expected fraction of clauses that can 
be satisfied can only decrease. 

Remark 13 For any k and n, fk{n,m)/m is a non-increasing function 
of m. 
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Proof. In a uniform random instance of Fk{n,m), let the maximum 
number of satisfiable clauses be J, so that E(J) = f{n,m). By deleting 
single clauses, we obtain m uniform random instances F of F{n,m — 1). 
Of these, m — J each have maxF = J, while the remaining J each 
have maxF E {J — 1, J}. The average of these m values is at least 
(m J){J)+{J){J 1) _ J{rn^ 1) ^ Taking expectations, we find ^^"^i ^ 

ixE(^^^)=E(^) = ^^^^, as desired. □ 

Finally, we expect a connection between the max sat limiting function 
conjecture (Conjecture I12j) above and the usual satisfiability threshold con- 
jecture (Conjecture^. We formalize this in the following conjecture. 

Conjecture 14 For any c < 0, lim„^oo /(f^, cn)/(cn) = 1 if and only if 
lim„_^oo Pr(-^('T') cn)zs satisfiable) = 1. 

One aspect of this is easily resolved. If lim sup /(n,cn)/(cn) < 
1, say 1 — 6, then on average cdn clauses per formula go unsatis- 
fied, at least a 5 fraction of all formulas must be unsatisfiable, and so 
lim sup Pr(i^(n, cn) is satisfiable) < 1. But nothing more seems obvious. 

6.2 High-density MAX k-SAT and MAX CSP 

In this section we extend Theorem H) 

ck 



Theorem 15 For all k, for all c sufficiently large, ( ^^^ -|- -j^r[\/ 



0(l))n</fc(n,cn)<(2_lc + VH 



n. 



Note that the leading terms are equal, and the second-order terms equal to 
within const -v^. 

Proof. Upper bound. The proof is very similar to that of Theorem |1J 
Using the first-moment method, we have: 

P = P(3 satisfiable F') 



1=0 ^ ^ 



For r < ^ the sum is dominated by the last term, and so we fix / 



rcn. 



Using taking logarithms, and finally substituting r = ^ — e, we have 



on c — \ 
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Thus for r < l/{2^) - ^ ^^^2^^^, P ^ as n ^ oo. 

Lower bound. Set the variables sequentiahy. Set variables 
Xi, X2, . . . , Xi-i randomly, and then for each i ^ i ^ n, enumerate those 
clauses involving only Xi and some subset of {Xi,X2, ■ ■ ■ ,X£-i} (that is, 
unit clauses). The expected number of such clauses is about 

n n n 

and if we count only those left unsatisfied by their previous k — 1 variables, 
the expected number becomes 

(Here we incur a minor error by sampling with replacement instead of with- 
out; (^)^~^ should really be ni^/i^A;-i(^r^)-) More precisely, the number 
of such clauses enjoys a Poisson distribution with mean hi. Set the value of 
Xi to maximize the number of such clauses satisfied; as before, this number 



is about + \\ r^h^ + 0(1). The advantage over purely random guessing 



IS 



i^^ + 0(l) = J-^(l-^)-i + 0(l). 



27r ^ ' V 27r2'=-i ' n 

Sum over i = i, . . . ,n to obtain an advantage of 



ck 2n „, , 
+ 0(n). 



7r2'= A; + 1 

□ 

Still more generally, we may consider a CSP (constraint satisfaction prob- 
lem). Let g he a k-ary "constraint" function, g : {0,1}*^ — > {0,1}. A 
random formula Fg{n,m) over g is defined by m clauses, each chosen uni- 
formly at random (with replacement) from the 2^n{n — 1) • • • (n — A; + 1) 
possible clauses defined by an ordered fc-tuple of distinct variables each 
appearing positively or negated. (Formally, a clause consists of a /c-tuple 
(ii, . . . of distinct values in [n], specifying the variables, and a binary 
fc-vector (cji, . . . , o"fc), specifying their signs.) A clause with variables (signed 
variables) Xi, . . . , X^ is satisfied if g{Xi, . . . , X/^) = 1. (Formally, an assign- 
ment xi, . . . ,Xn of the full set of variables Xi, . . . , Xn satisfies a clause as 
above if g{xi-^ © cJi, . . . , Xi^ © an) = 1, where "©" denotes XOR, or addition 
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modulo 2.) As ever, such a formula F is satisfiable if there exists an assign- 
ment of the variables satisfying all the clauses; and maxF is the maximum, 
over all assignments, of the number of clauses satisfied. 

Generally a CSP may be based on a finite family of constraint functions, 
of "arities" bounded by k, but for notational convenience we limit ourselves 
to a single function. 

Let a fc-ary clause function g be given, with K{g{X)) = p over random 
inputs. Define P = mm{p, 1—p} and Q = 1 — P. Let Fg{n,m) be a random 
formula over g on n variables, with m clauses, and let fg{n, m) = E(max F) . 

Theorem 16 Given an arity k and a constraint function g, for all c suffi- 
ciently large, {pc+ PQ'^cjk^n < fg{n,m) < (pc-|- ^J2PQ ln(2)c)n. 

The proof follows that of Theorem I15( and is omitted. 

7 Online random MAX 2-SAT 

In this section, we discuss online versions of the max 2-sat problem. [BFOH 
IBFW02] consider an online version of max giant-free spanning subgraph, in 
which random edges are given one by one, and we must accept or reject 
Ci based on the previous edges ei, . . . ,ei_i, with the goal of accepting as 
many edges as possible without creating a giant component. 

There are two natural online interpretations of random MAX 2-SAT. In 
both, we are told in advance the total number of variables n and clauses m; 
also, in both, clauses Cj are presented one by one, and we must choose "on 
line" whether to accept or reject Cj based on the previously seen clauses 
ci,...,Cj_i. When we accept a clause we are guaranteeing to satisfy it; 
when we reject a clause we are free to satisfy or dissatisfy it. Our goal is to 
maximize the number of clauses accepted. 

In our first interpretation of online MAX 2-SAT, Online I, when we 
accept a clause, we are also required to satisfy it immediately, by setting 
at least one of its literals True; once a variable is set, it may never be 
changed. The second interpretation. Online II, is more generous: the 
variables' assignments may be decided after the last clause is presented. 
Let fo-iin-jm) be the expected number of clauses accepted by an optimal 
algorithm for Online I, and fo-uin^m) that for Online II. Clearly, |?ti ^ 
/o-i(n, m) ^ fo-u{n',m) ^ f{n,m). Here we present a "lazy" algorithm 
applicable to both /o-i(n,cn) and /o-ii(n,cn). Online-Lazy begins with 
no variables "set". On presentation of a clause, Online-Lazy rejects it only 
if it must, and otherwise does the least it can to accept it. Specifically, on 
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presentation of clause q , which without loss of generality we may consider 
to be {X V y), it takes the following action. If X = True or y = True, 
accept Cj. li X = False and Y = False, reject Cj. li X = False and Y is 
unset (or vice- versa), set Y = True (resp. X = True) and accept Cj. U X 
and Y are both unset, arbitrarily choose one, set it True, and accept Cj. 

Theorem 17 For any fixed c, Online-Lazy is the unique (up to its arbi- 
trary choice) optimal algorithm for Online I, and fo-i{n,cn) ~ (|c-|- (1 — 
e-^)/4 + (1 - e-^)V8)n ^ (|c + |)n. 

We note that for c = 1, fo-i{n,n) w 0.957997n, and for c asymptotically 
large, /o-i(n,cn) ~ (|c |)n. 

Proof of optimality. On appearance of a clause Cj , it is clearly best not to set 
any variable not appearing in Cj, for this merely imposes extra constraints. 
Similarly, if Cj is already satisfied by one of its literals, then it is best to 
accept it and to set no additional variables. 

The only interesting cases, then, are if Cj is not already satisfied, but 
one or both of its variables are unset. Again, if both variables are unset, 
it is best to set at most one of them, and it doesn't matter which one: the 
"future" performance of an optimal algorithm is solely a (random) function 
of the number of unset variables and the number of clauses remaining, and 
these parameters of the future, as well as the number of clauses accepted in 
the past, are the same whether q's first or second literal is set. 

It only remains to show that if Cj is not satisfied by a variable already 
set, and at least one of its variables is not yet set, then an optimal algorithm 
must set one of its literals to True. Consider a putatively optimal algorithm 
Opt which does not do this, so for a literal X in Ci, either Opt sets X to 
False, or it leaves X unset. 

In the case when Opt sets X to False, let a competing algorithm Opt' 
set X to True, then simulate Opt but reversing the roles of X and X in 
future clauses. "Couple" the distribution of future random clauses seen by 
Opt and Opt', also by reversing the roles of X and X . With this coupling. 
Opt' accepts exactly the same number of clauses as Opt in the future, but 
has accepted one additional clause so far (cj); this contradicts the supposed 
optimality of Opt. 

The slightly less obvious case is when Opt leaves X unset. Again we 
introduce a competing algorithm Opt', which sets X to True, then simulates 
Opt until such time as Opt sets X. For inputs where Opt never sets X, 
Opt' accepts every clause that Opt accepts, as well as the clause q, and 
perhaps additional clauses in which X appears; Opt' is strictly better on 
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these inputs. For inputs where Opt eventually sets X to True, Opt' goes 
on simulating Opt, again peforming exactly as well on future clauses, and 
strictly better on past ones. For inputs where at time j > i. Opt sets X to 
False, Opt' may simulate Opt but (as before) with the roles of X and X 
reversed. With the previous coupling, on these inputs, Opt' accepts exactly 
as many future clauses as Opt, and at least as many in the past (Opt' 
has accepted q and perhaps other clauses rejected by Opt, while Opt has 
accepted Cj and no other clause rejected by Opt'). So in all three cases, 
the expected number of clauses accepted by Opt' is at least as many as for 
Opt, and in the first two cases, which occur with nonzero probability (for 
example, if no future clause contains X), strictly more; this contradicts the 
supposed optimality of Opt. 

Proof of performance. 

Note that clauses causing a variable to be set by Online-Lazy are 
always satisfied, and those not causing a variable to be set are satisfied with 
probability 3/4 (if both variables are set) or 1 (if one is set satisfyingly) . 

If k variables are yet to be set, the probability that a clause has neither 
variable set is {k/rif' , the probability it has one variable set non-satisfyingly 
and the other not set is 2 • ^ • {{n — k) /n){k /n) , so a random clause falls 
into one of these cases w.p. k/n. The expected time to set another variable 
when k are unset is thus n/k. In this period, clauses have (unconditioned) 
probabilities (n — Kf' jv? that both variables are set, and k(n — k)/n'^ that 
one is set satisfyingly and the other unset; conditional upon one or other of 
these being the case (a variable is not set for this clause), the probabilities 
are (n — k)/n for the first case and k/n for the second, and the clause is 
satisfied with probabilities 3/4 and 1 in these cases, for average gain \k/n 
over the naive 3/4. The total gain in the number of clauses satisfied in the 
expected n/k — 1 steps before the setting, and the n/A;'th step with the 
setting, is (^ — l){^k/n) + 1/4 = 1/2 — \k/n. The process goes through 
k = n,n — 1, . . . ,n — I* , until the sum of the waiting times exceeds the 
number of clauses cn. Where H{i) denotes the i'th harmonic number, 
for a given /, the expected sum of the waiting times is "^l^Qn/ln — i) = 
n[H{n) — H{n — I — 1)) ~ n(ln(n/(n — /))). Solving for this equal to cn 
gives n/(n — I) = exp(c), or / = n(l — exp(— c)). 

What is the variance in the total waiting time W , for 1 = 1, and where 
we will allow the total to exceed cn? Each individual waiting time is geomet- 
rically distributed with a mean in the range ^ = 1 to ^(exp(-c)) ~ ^^Pi^)^ 
all of which are 0(1), so W has standard deviation 0{^/n). The amount 
by which we may have overshot (or fallen short of) the target value cn 
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is W — cn; since each round takes time at least 1, to reach precisely 
cn it suffices to back off (or add) at most W — cn rounds. That is, 
\I* — I\ ^ — cn\, which with probability exponentially close to 1 is 
0{'n?^^). The expected total number of clauses satisfied over the naive 3/4 

fraction is then E (eCo(1/2 - ji'^ - ^)/^)) ~ ^/^ + I^/{8n). That is, the 
expected number of clauses satisfied is ~ (|c+ (1 — e~'^)/4 + (1 — e^'^)^/8)n. 

□ 

Note that Online-Lazy does not, in fact, need to know the number of 
clauses in advance! 

A variant of Online I is that if we accept a clause we must set both its 
variables. In this case, similar arguments show that an optimal algorithm 
simply sets each new literal True. 

We know essentially nothing about Online II. To obtain improved 
bounds, or, ideally, to identify a provably optimal algorithm, are interesting 
open problems. 

8 Random MAX CUT 
8.1 Motivation 

One source of motivation for our work was, as mentioned in the introduction, 
that although random constraint satisfaction problems (cSPs) and max CSPs 
are well studied, random max CSPs seem not to have been. However, we 
had a second, particular source of motivation, in recent work on "avoiding 
a giant component" in a random graph. 

Think of max sat as the problem of, given a formula, to select as many 
clauses as possible so that the subformula of selected edges is satisfiable. 
An analogous problem is, given a graph, to select as many edges as possible 
so that the subgraph of selected edges has no giant component (suitably 
defined). 

The latter problem was posed in a slightly different form by Achlioptas, 
who asked how many random edge pairs could be given, such that by select- 
ing one edge from each pair, a giant component could be avoided. Bohman 
and Frieze showed in |BF01j that a giant component can be avoided with 
0.55n edge pairs (where a random selection of one edge from each pair would 
almost surely generate a giant component). Bohman, Frieze, and Wormald 
|BFWfl2| considered the problem without Achlioptas's original "pairing" as- 
pect: how many edges may a random graph have, so that some subgraph 
with 1/2 the edges has no giant component. They show that this is true up 
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to about 1.958n edges but not beyond (where the precise threshold satisfies 
a transcendental equation). Without the pairing aspect, there is no longer 
anything special about 1/2, though, and |BFW02] is easily extended to an- 
swer the question: for a random graph G(n, ere), how many edges /(n, cn) 
may be retained while avoiding a giant component. This is precisely the 
same sort of question we considered for SAT, and was in our minds when we 
began this work. 

It is tempting to imagine a particular connection between the two ques- 
tions, because of a well known connection between the unsatisfiability of 
a random 2-SAT formula and the existence of a giant component in a ran- 
dom graph, most easily explained in terms of branching processes. For a 
2-SAT formula F, consider a branching process on literals, where a literal 
X has offspring including y if F includes a clause {X,y} (and if Y was 
not the parent oi X). (The process models the fact that if X is set true, 
Y must also be set true to satisfy F). Although additional work is needed 
to prove it, a random 2-SAT formula is satisfiable with high probability if 
this branching process is subcritical (if each X has an expected number of 
offspring < 1) and unsatisfiable w.h.p. if it is supercritical. For a random 
graph G, consider a branching process on vertices, where a vertex v has 
offspring including w if G has an edge {v, (and if w was not the parent 
oi v). Here, w.h.p. G has no giant component if the process is subcritical, 
and w.h.p. has one if it is supercritical. These intuitively explain the phase- 
transition thresholds of cn clauses, c = 1, for a random 2-SAT formula, and 
edge density c/n, c = 1, for a random graph. 

Despite this connection between unsatisfiability of a random formula, 
and a giant component in a random graph, the size of a largest giant-free 
subgraph of a random graph behaves very differently from the size of a 
largest satisfiable subformula of a random formula. Specifically, for large 
clause density c, there is a satisfiable subformula preserving an expected 
constant fraction (3/4ths) of the clauses, while for a random graph with 
cn edges, the largest giant-free subgraph has only about n edges, a 1/c 
fraction. This can be read off from Theorem 1181 or argued more simply: if 
G had a giant-free subgraph H with linearly more than n edges, H (and 
thus G) would have to have a linear-size dense component, but a random 
sparse graph has no linear-size dense component. 

Define /nogiant('^5 "i) = E(max giant-free(G(n, m)). 

Theorem 18 With t = t{c) < 1 defined by te~* = 2ce~'^'^ , fnogiant{n, cn) = 
rcn when 5; + 1 = 2^ + cr . 

The theorem is proved as in |B.b'W02] (modifying their Lemma 1 to allow 
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values c > 2 by replacing a (logn)/6 with (logn)/(61ogc)). 

Is there another max subgraph problem, then, which does behave like 
MAX 2- SAT? Going back to the branching process for a random graph — the 
source of the intuitive connection between the graph and SAT problems — it 
is also easy to check that w.h.p. a graph has few cycles when the branching 
process is subcritical, and many cycles when it is supercritical. So perhaps 
we should consider the size of maximum cycle-free subgraph. But this is by 
definition a forest, which may have at most n — 1 edges, again a 1/c fraction, 
not a fixed constant fraction as for MAX 2- SAT. 

In a 2-SAT formula, obstructions to satisfiability come not from cycles of 
implications X =^ ■ ■ ■ =^ X , but only from those with X =^ ■ ■ ■ =^ 
X. By a very vague analogy, then, perhaps on the graph side we should 
seek not a subgraph which is entirely cycle- free, but just one which is free 
of odd cycles: a bipartite subgraph. The size of a largest bipartite subgraph 

of G is by definition, and more familiarly, the size of a maximum cut of 
G. Here, finally, we share with max 2-sat that we may keep a constant 
fraction of the input structure: for a random graph (indeed any graph) G of 
size m, max cut(G) ^ m/2, since a random cut achieves this expectation. 

8.2 MAX CUT 

In addition to the fact that just as a maximum assignment satisfies at least 
3/4ths the clauses of any formula, a maximum cut cuts at least 1/2 the 
edges of a graph, there are other commonalities. 

max CUT, like MAX 2-SAT, is a constraint satisfaction problem (csp). 
With each vertex v we associate a boolean variable representing the par- 
tition to which V belongs, and with each edge {u, v} we associate a "cut 
constraint" {u(Bv), these XOR constraints replacing 2-sat's disjunctions. 
Like decision 2-sat, the problem of whether a graph is perfectly cut- 
table (bipartite) is solvable in essentially linear time. In further analogy 
with MAX 2-SAT, MAX CUT is NP-hard, trivially ^-approximable, 0.878- 
approximable |GW95j by semidefinite programming, and not better than 
16/17-approximable |TSSWOO| in polynomial time, unless P=NP. 

The methods we have applied to random max 2-sat are equally appli- 
cable to MAX CUT, and yield analogous results. Because it is easier to work 
with random graphs than random formulas, and more is known about them, 
our results for MAX CUT are in some respects stronger than those for MAX 
2-sat. 

When we work in the G{n,p) model we will take p = 2c/n, and in the 
G{n, m) model, m = [cn\ , so that in both cases the phase transition occurs 
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at c = 1/2. We now state our main results. 



8.3 Results 

Theorem 19 For c = 1/2— e{n), with n~^/^ ^ e{n) < 1/2, fcnt{n, [en]) = 
[cnj -e(ln(l/e)) +9(1). 

In particular, for small constants e this gap of 0(ln(l/e)) — which for a fixed 
e is 0(1) — contrasts with the gap of G(l/n) for max 2-sat (Theorem^)). 
But here too there is a phase transition, in that for c > 1/2 the gap jumps 
to Q{n), per Theorem 

Theorem 20 For c large, (^^c+ ^/c- y^8/{9TT)^ n < fcnt{n,cn) < 
(ic + VH7M2)72)rz. 

The values of Y^8/(97r) and Y^ln(2)/2 are approximately 0.531922 and 
0.588704, respectively. The upper bound was previously obtained 
in |B(:P97j . 

Theorem 21 For any fixed e > , {^ + £-^{e^))n < fcnt{n,il/2 + e)n) < 
{l + e-n{eyin{l/e)))n. 

The upper bound's e'^/ln(l/e) can probably be replaced by e^, just as we 
suspect it can be for Theorem [SJ This presumption is largely based on the 
next "scaling window" result. 

Theorem 22 For any function e = e{n) with n~^/^ ^ e{n) <C 1, 
/eut(n, (1/2 + e)n) = + e - Qie^))n. 

That the theorem misses out the extremes e = @{n~^/^) and e = 
that are perhaps of greater interest than the mid-range is a direct carryover 
from the standard results on random graphs on which we based our proof 
is based; it is likely that other established results for random graphs could 
complete the picture. 

Before proceeding, we remark that bipartiteness is of course the same 
as 2-colorability, and it is sometimes convenient to speak of coloring ver- 
tices black or white, rather than placing them in the left or right part of a 
partition, with properly colored edges (with one black and one white end- 
point) corresponding to cut edges; these two ways of speaking are of course 
mathematically identical. 
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8.4 Subcritical MAX CUT 



Theorem 1191 Proof. For notational convenience we work in the G{n, p) 
model, G = G{n,{l — 2e)/n), but the proof follows identically for the 
G{n, m) model. 

Tree components of G can be cut perfectly; each unicyclic component 
can be cut for all but 1 edge; and complex components, where more edges 
must go uncut but which with high probability are absent from G, contribute 
negligibly. That is, E(7^uncut edges) = (1— o(l))E(#cycles in G). Since the 
number of potential fc-cycles is (n)fc/(2/c), where {n)k = n(n— 1) • • • (n—k+l) 
denotes falling factorial, using {n)k = n'^ exp(— A;^/(2n) — 0{k/n + k'^ /v?)) 
(see (JTMI eq (5.5)]), 

E(#cycles in G) = M^(c/n)'^ 

fe=3 

= y exp(-/tV(2n) - 0{k/n + k'^/n^)). 
^-^ 2k 

Because of the , up to constant factors we need consider the sum only up 
to ^ 1/e (recalling c = l — 2e), and since e ^> this makes the entire 

final exponential term negligibly close to 1. Thus 

oo 

E(#cycles in G) = G(l) ^ cV(2/t) 

fc=3 

= e(i)(-^in(i-c))-e(i) 

= e(l)ln(l/(2e))-G(l), 
where the final term lies between and 3/2. □ 

8.5 High-density random MAX CUT 

Theorem 1201 Proof. For the upper bound, we apply a first-moment 
argument identical to that used in the proof of Theorem 0] The prob- 
ability that there exists a (maximal) bipartite spanning subgraph of size 
^ (l-r)cn is P < 2"(;;;^)(l/2)(i-'^)<="(l/2)'-^", for ^ InP < In 2/c-r Inr - 
(1 — r)ln(l — r)— ln2. Substituting r = 1/2 — e gives ^In-P < ln2/c— 2e^, 
so if e > ^ln(2)/(2c) then P ^ 0. 
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For the lower bound, color the vertices in random sequence. When xn 
vertices have been colored, with x = 6(1), since c is large, the next ver- 
tex is a.s. adjacent to a.e. 2cx of the colored ones. In the worst case, the 
colored vertices are half black and half white; coloring the new vertex oppo- 
sitely to the majority beats cx (in expectation) by K{\B{2cx, 1/2) — cx\) ~ 
E(|iV(0, cx)|) = y^2cx/TT. Integrating over x from to 1 gives ny^2c/TT ■ | 
more properly colored edges than the naive |cn. □ 

8.6 Low-density random MAX CUT 

The following fact follows from small-e asymptotics of classical random 
graph results; see, e.g., |Bol98l VII. 5, Theorem 17]. 

Claim 23 For e > 0, a random graph G{n, (1/2 + e)n) a.s. has a giant 
component of size {4e + o{e))n. 

Proof. It is well known (see, e.g., jBol98l VII. 5, Theorem 17]) that for an 
arbitrarily slowly growing function w{n), a.s., the size L(i)(G) of the giant 
component satisfies \L^-^\G) — jn\ ^ w{n)n^^'^ where < 7 < 1 is the unique 
solution of e^'^'^'^ = 1— 7. (We have 2c where |Bol98j has c because we use cn 
edges where it uses average degree c.) Take the asymptotic approximation 
when c = 1/2 -|- e. □ 

Claim 24 The probability that a random graph G{n, (l/2 + e)n) is bipartite, 
conditioned on the existence of a component of size Q{en) created by the 
"first" {l/2 + e/2)n edges, is exp{-n{£^n)) . 

Proof. If the presumed giant component is not bipartite, we are done. If it 
is, by connectivity, it has a unique bipartition; let the sizes of the parts be ni 
and 712. Each of the remaining en/2 edges has both endpoints in the giant 
component w.p. 0(e^), so there are 0(e^n) of these, w.p. 1 — exp(— O(e^n)). 
The probability that each such edge preserves bipartiteness is (2ni?7,2)/(ni + 
"-2)^ ^ 1/2; over the G(e^n) independent edges it is exp(— f](e^n)) . □ 

Theorem 1211 Proof. For the upper bound, the first-moment method is 
applied exactly as in the proof of Theorem |SJ We use the preceding Claim, 
and replace its with an ao for definiteness. With c = (1/2 -|-e), then, the 
probability that deleting any k ^ rcn edges can leave a bipartite subgraph 
is P ^ X^fc^o (T) exp(— ao(e — k/nY). This is just as in inequality ((21), so 
here again we conclude that r > a^e^ / In^V / e) . 
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The proof of the lower bound is algorithmic, and in direct analogy to 
that of Theorem |SJ Think of a graph edge neither of whose vertices has yet 
been colored as a "2-clause" , an edge one of whose vertices has been colored 
as a "unit clause" implying the opposite color for the remaining vertex, an 
edge whose two vertices have been colored alike as an "unsatisfied clause", 
and an edge whose two vertices have been colored oppositely as a "satisfied 
clause" . Terminate if there are no unit clauses nor 2-clauses. If there are no 
unit clauses, randomly color a random vertex from a random edge. If there 
are unit clauses, choose one at random and color its vertex satisfyingly. 

Note that when a 5 fraction of the vertices have been colored, (1 — 5)n 
vertices remain uncolored (unfixed variables), and a.s. a.e. (l/2+e)n-(l — 5)^ 
2-clauses remain. Each time a unit-clause variable is set, each 2-clause has 
probability 2/((l — 6)n) of generating a unit clause 

Thus the expected number of 2-clauses becoming unit clauses is 2 [( 1/2-1- 
e)n(l— (5)^]/[(l— (5)n] ~ l+2e—6, while the number of unit clauses eliminated 
(satisfied or unsatisfied) is at least 1. Thus the expected increase per step in 
the number of unit clauses is at most 2e — 6. As in the proof of Theorem |21 
over the first 4en steps, the expected number of unit clauses is bounded by 
an inverted parabola of base Aen and height 2e^n. Improperly colored edges 
result only from violated unit clauses, and the expected number of these in 
the first 4en steps is ^ | ■ Aen ■ 2£^n/n ^ By step 4en there are no 

unit clauses, and the number of 2-clauses divided by the number of unset 
variables is a.s. a.e. [(1/2 + e)n ■ (1 - 4e)2]/[n • (1 - 4e)] = 1/2 - e. This is 
a sparse random graph, which by Theorem can be colored to violate just 
0(1) edges. 

In toto, all but < {^e^n + 6(1)) ed ges are properly colored. □ 
8.7 Scaling window 

The proof of Theorem EH follows rather easily from standard — but rela- 
tively recent, and lovely — facts about the kernel of a random graph. The 
following summary of the relevant facts, which we present informally, is 
distilled from |,)LRnni Sec. 5.4]. 

First, ii n^^^, then the number of vertices of G(n, n/2 + i) belonging 
to unicyclic components is asymptotically almost surely 0(n^/i^). Consider 
the components of a graph G which are trees, unicyclic, or complex. In 
the supercritical phase with n^^/'^ <^ e <^ 1, a random graph G(n, (1/2 -|- 
e)n) consists of tree components, unicyclic components, and no complex 
component other than a single "giant component" . The expected number of 
vertices in the cycles of the unicyclic components is of order 1/e. The giant 
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component's 2-core has order (1 + o(l))8e'^n, and is obtained as a random 
subdivision of the edges of a "kernel", which is a random cubic graph on 
(1 + o(l))^e^n vertices. 

Theorem 1221 Proof. We consider which edges of G it may be impossible 
to cut. Every edge in the tree components of G = G(n, (1/2 + e)n) can 
of course be cut. For each unicyclic component, at most 1 edge must go 
uncut (if the cycle is odd). By the symmetry rule (see for example |JLR00l 
Theorem 5.24]), the number of unicyclic components for G{n, (1/2 + en)) is 
essentially the same as for G{n, (1/2 — en)), which by Theorem 1191 is only 
0(ln(l/e)). 

The dominant contribution will come from the giant component. Edges 
which are not in its 2-core can of course all be cut, even after a partition of 
the 2-core has been decided. Moreover, an optimal partition of the 2-core 
is essentially decided by a partition of the vertices of the "kernel", which 
is the 2-core where each path whose internal vertices are all of degree 2 is 
replaced by a single edge. (See |JLR001 Chap. 5.4] for more on the giant 
component, its core, and its kernel.) For any cut of the kernel, each 2- 
core path corresponding to a kernel edge can be partitioned either perfectly 
or with one edge uncut, depending on the parity of the path's length and 
whether its endpoints are on the same side or opposite sides of the kernel's 
cut. Equivalently, a kernel edge whose 2-core path is of odd length imposes 
a "cut" constraint on its endpoints, while a kernel edge whose 2-core path is 
of even length imposes an "uncut" constraint on its endpoints; the number 
of these constraints violated by a cut of the kernel vertices is equal to the 
number of original cut constraints violated by an optimal extension of the 
same cut to all the 2-core vertices (and indeed to all the giant-component 
vertices) . 

Since each kernel edge is randomly subdivided, on average into 3/(4e) 
2-core edges, the parities of the kernel edges are almost perfectly random 
(with the probability of either parity approaching 1/2 as e approaches 0). 
For our purposes it suffices that either parity occurs with probability at most 
some absolute constant pQ < 1, and using this we show that at least some 
constant fraction /?o of the approximately IGe^n edge constraints must be 
violated. 

Fix a spanning tree T of the kernel K , whose order we will write as 
N (expecting N ~ ^e^n). Let K subsume not only the graph but also 
the edge parities, so that it is an instance of the generalized (cut/uncut) 
MAX CUT problem. If it is possible to violate precisely a fraction /? < /3o of 
iiT's constraints then reversing precisely those constraints gives a perfectly 
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satisfiable cut/uncut constraint problem instance K' . 

Fixing the "side" of any one vertex, the — 1 constraints from the 
spanning tree T imply the rest of the cut, which must then satisfy the re- 
maining ^|A^| + 1 constraints. Viewing the parities of the spanning tree 
edges as arbitrary, and the remaining edges as independent random vari- 
ables, the probability that the randomly chosen kernel edges satisfy each of 

these constraints is at most Pq^~^^ ■ The number of choices of t < [)q^N 

edges to dissatisfy is ( . We guarantee an exponentially small probability 
of success by selecting /3o to satisfy: 

^iVi7(/3o) + ^iVln(po) <0 

if(/?o) < ^ln(l/po), 

where H is the entropy function H{x) = xln(x) — (1 — x)ln(l — x). In 
particular, in the case of interest where e^O, 1/2 and /3o — > 

if~"'^(l/3) PS 0.896. Recapitulating, we must dissatisfy [5oN kernel con- 
straints, = (32/3o/3)e'^n constraints of G. The expected 0(ln(l/e)) uncut 
edges from unicyclic components are negligible by comparison, so in all 
<d{e^n) edges of G go uncut. 

□ 



9 Conclusions and open problems 

We have presented a road map for max 2-sat and max cut in a random 
setting, establishing that there is a phase transition, and deriving asymp- 
totics below the critical value, for constants slightly above the critical value 
and in the scaling window around it, and for larger constants. 

For constant densities slightly above threshold there is a logarithmic gap 
between our lower and upper bounds; we need to confirm that the ln(l/e) 
factors are extraneous. In the other cases, our bounds are only separated by 
a constant. However, in light of the exact result of |BFWn2] for the size of 
a maximum subgraph which has no giant component, it would be wonderful 
to get the exact asymptotics of f{n,cn)/{cn). 

Whether f{n,cn)/{cn) tends to a limit in n (see Conjecture I12() is to 
our minds a prime open problem in this area, and is not only in some sense 
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analogous to the satisfiability threshold conjecture, but may also be directly 
connected with it (see Conjecture I14() . another important question. 

A question similar in spirit to Conjecture 1121 was considered in |Gam02j . 
which defines a certain linear-programming relaxation of MAX 2-SAT. An 
instance in characterized by its "distance to feasibility" O, with 0{n,cn) 
the corresponding random variable for a random instance. It is shown that 
for every c > 0, 0{n,cn)/{cn) almost surely converges to a limit. The 
result is established using powerful local weak convergence methods |Ald921 
lAldOU IASn2j . It remains to be seen whether these methods are applicable 
to random maximum constraint satisfaction problems, including MAX 2-SAT 
and MAX CUT. 
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